The security of artificial intelligence (AI) is an important research area towards safe, reliable, and trustworthy AI systems. To accelerate the research on AI security, the Artificial Intelligence Security Competition (AISC) was organized by the Zhongguancun Laboratory, China Industrial Control Systems Cyber Emergency Response Team, Institute for Artificial Intelligence, Tsinghua University, and RealAI as part of the Zhongguancun International Frontier Technology Innovation Competition (https://www.zgc-aisc.com/en). The competition consists of three tracks, including Deepfake Security Competition, Autonomous Driving Security Competition, and Face Recognition Security Competition. This report will introduce the competition rules of these three tracks and the solutions of top-ranking teams in each track.
translated by 谷歌翻译
The foundation models have recently shown excellent performance on a variety of downstream tasks in computer vision. However, most existing vision foundation models simply focus on image-level pretraining and adpation, which are limited for dynamic and complex video-level understanding tasks. To fill the gap, we present general video foundation models, InternVideo, by taking advantage of both generative and discriminative self-supervised video learning. Specifically, InternVideo efficiently explores masked video modeling and video-language contrastive learning as the pretraining objectives, and selectively coordinates video representations of these two complementary frameworks in a learnable manner to boost various video applications. Without bells and whistles, InternVideo achieves state-of-the-art performance on 39 video datasets from extensive tasks including video action recognition/detection, video-language alignment, and open-world video applications. Especially, our methods can obtain 91.1% and 77.2% top-1 accuracy on the challenging Kinetics-400 and Something-Something V2 benchmarks, respectively. All of these results effectively show the generality of our InternVideo for video understanding. The code will be released at https://github.com/OpenGVLab/InternVideo .
translated by 谷歌翻译
作为智能机器人的一项基本任务,Visual Slam在过去几十年中取得了长足的进步。但是,在高度弱质地的环境下,强大的大满贯仍然非常具有挑战性。在本文中,我们提出了一个名为RWT-Slam的新型视觉大满贯系统,以解决这个问题。我们修改LOFTR网络,该网络能够在低纹理的场景下产生密集的点匹配以生成特征描述符。为了将新功能集成到流行的Orb-Slam框架中,我们开发了功能面具,以滤除不可靠的功能并采用KNN策略来增强匹配的鲁棒性。我们还对新的描述符进行了视觉词汇,以有效地循环结束。在TUM和Openloris等各种公共数据集以及我们自己的数据中测试了由此产生的RWT-SLAM。结果显示在高度弱质地的环境下表现非常有希望。
translated by 谷歌翻译
近年来,由于深度学习技术的发展,LiDar Point Clouds的3D对象检测取得了长足的进步。尽管基于体素或基于点的方法在3D对象检测中很受欢迎,但它们通常涉及耗时的操作,例如有关体素的3D卷积或点之间的球查询,从而使所得网络不适合时间关键应用程序。另一方面,基于2D视图的方法具有较高的计算效率,而通常比基于体素或基于点的方法获得的性能低。在这项工作中,我们提出了一个基于实时视图的单阶段3D对象检测器,即CVFNET完成此任务。为了在苛刻的效率条件下加强跨视图的学习,我们的框架提取了不同视图的特征,并以有效的渐进式方式融合了它们。我们首先提出了一个新颖的点范围特征融合模块,该模块在多个阶段深入整合点和范围视图特征。然后,当将所获得的深点视图转换为鸟类视图时,特殊的切片柱旨在很好地维护3D几何形状。为了更好地平衡样品比率,提出了一个稀疏的柱子检测头,将检测集中在非空网上。我们对流行的Kitti和Nuscenes基准进行了实验,并以准确性和速度来实现最先进的性能。
translated by 谷歌翻译
尽管在许多领域都有成功的应用,但如今的机器学习模型遭受了臭名昭著的问题,例如脆弱性,对对抗性例子。除了陷入对抗攻击和防御之间的猫与小鼠游戏之外,本文还提供了替代观点来考虑对抗性示例,并探索我们是否可以在良性应用中利用它。我们首先将对抗性示例归因于使用非语义特征的人类模型差异。尽管在经典的机器学习机制中很大程度上被忽略了,但非语义功能具有三个有趣的特征,因为(1)模型独有,(2)对推理至关重要,以及(3)可利用的功能。受到这一点的启发,我们提出了良性的对抗性攻击的新想法,以利用三个方向的对抗性示例以善良:(1)对抗性图灵测试,(2)拒绝恶意模型应用,以及(3)对抗性数据扩增。每个方向都以动机详细说明,理由分析和原型应用来展示其潜力。
translated by 谷歌翻译
Making safe and human-like decisions is an essential capability of autonomous driving systems and learning-based behavior planning is a promising pathway toward this objective. Distinguished from existing learning-based methods that directly output decisions, this work introduces a predictive behavior planning framework that learns to predict and evaluate from human driving data. Concretely, a behavior generation module first produces a diverse set of candidate behaviors in the form of trajectory proposals. Then the proposed conditional motion prediction network is employed to forecast other agents' future trajectories conditioned on each trajectory proposal. Given the candidate plans and associated prediction results, we learn a scoring module to evaluate the plans using maximum entropy inverse reinforcement learning (IRL). We conduct comprehensive experiments to validate the proposed framework on a large-scale real-world urban driving dataset. The results reveal that the conditional prediction model is able to forecast multiple possible future trajectories given a candidate behavior and the prediction results are reactive to different plans. Moreover, the IRL-based scoring module can properly evaluate the trajectory proposals and select close-to-human ones. The proposed framework outperforms other baseline methods in terms of similarity to human driving trajectories. Moreover, we find that the conditional prediction model can improve both prediction and planning performance compared to the non-conditional model, and learning the scoring module is critical to correctly evaluating the candidate plans to align with human drivers.
translated by 谷歌翻译
Background and Purpose: Colorectal cancer is a common fatal malignancy, the fourth most common cancer in men, and the third most common cancer in women worldwide. Timely detection of cancer in its early stages is essential for treating the disease. Currently, there is a lack of datasets for histopathological image segmentation of rectal cancer, which often hampers the assessment accuracy when computer technology is used to aid in diagnosis. Methods: This present study provided a new publicly available Enteroscope Biopsy Histopathological Hematoxylin and Eosin Image Dataset for Image Segmentation Tasks (EBHI-Seg). To demonstrate the validity and extensiveness of EBHI-Seg, the experimental results for EBHI-Seg are evaluated using classical machine learning methods and deep learning methods. Results: The experimental results showed that deep learning methods had a better image segmentation performance when utilizing EBHI-Seg. The maximum accuracy of the Dice evaluation metric for the classical machine learning method is 0.948, while the Dice evaluation metric for the deep learning method is 0.965. Conclusion: This publicly available dataset contained 5,170 images of six types of tumor differentiation stages and the corresponding ground truth images. The dataset can provide researchers with new segmentation algorithms for medical diagnosis of colorectal cancer, which can be used in the clinical setting to help doctors and patients.
translated by 谷歌翻译
内存处理(PIM)是一种越来越多地研究的神经形态硬件,承诺能量和吞吐量改进以进行深度学习推断。 PIM利用大量平行,有效的模拟计算在内存内部,绕过传统数字硬件中数据移动的瓶颈。但是,需要额外的量化步骤(即PIM量化),通常由于硬件约束而导致的分辨率有限,才能将模拟计算结果转换为数字域。同时,由于不完善的类似物到数字界面,PIM量化中的非理想效应广泛存在,这进一步损害了推理的准确性。在本文中,我们提出了一种培训量化网络的方法,以合并PIM量化,这对所有PIM系统无处不在。具体而言,我们提出了PIM量化意识培训(PIM-QAT)算法,并通过分析训练动力学以促进训练收敛,从而在向后传播期间引入重新传播技术。我们还提出了两种技术,即批处理归一化(BN)校准和调整精度训练,以抑制实际PIM芯片中涉及的非理想线性和随机热噪声的不利影响。我们的方法在三个主流PIM分解方案上进行了验证,并在原型芯片上进行了物理上的验证。与直接在PIM系统上部署常规训练的量化模型相比,该模型没有考虑到此额外的量化步骤并因此失败,我们的方法提供了重大改进。它还可以在CIFAR10和CIFAR100数据集上使用各种网络深度来获得最受欢迎的网络拓扑结构,在CIFAR10和CIFAR100数据集上,在PIM系统上达到了可比的推理精度。
translated by 谷歌翻译
由于互动交通参与者的随机性质和道路结构的复杂性,城市自动驾驶的决策是具有挑战性的。尽管基于强化的学习(RL)决策计划有望处理城市驾驶方案,但它的样本效率低和适应性差。在本文中,我们提出了Scene-Rep Transformer,以通过更好的场景表示编码和顺序预测潜在蒸馏来提高RL决策能力。具体而言,构建了多阶段变压器(MST)编码器,不仅对自我车辆及其邻居之间的相互作用意识进行建模,而且对代理商及其候选路线之间的意图意识。具有自我监督学习目标的连续潜伏变压器(SLT)用于将未来的预测信息提炼成潜在的场景表示,以减少勘探空间并加快训练的速度。基于软演员批评的最终决策模块(SAC)将来自场景rep变压器的精制潜在场景表示输入,并输出驾驶动作。该框架在五个挑战性的模拟城市场景中得到了验证,其性能通过成功率,安全性和效率方面的数据效率和性能的大幅度提高来定量表现出来。定性结果表明,我们的框架能够提取邻居代理人的意图,以帮助做出决策并提供更多多元化的驾驶行为。
translated by 谷歌翻译
通用样式转移(UST)从任意参考图像中注入样式中的内容图像。现有的方法虽然享有许多实际的成功,但无法解释实验观察,包括UST算法的不同性能在保存内容图像的空间结构时。此外,方法仅限于对风格化的繁琐全局控制,因此它们需要其他空间掩码才能进行所需的风格化。在这项工作中,我们为UST的一般框架提供了系统的傅立叶分析。我们在频域中提出了框架的等效形式。该形式意味着现有算法平均处理特征图的所有频率组件和像素,但零频率组件除外。我们分别将傅立叶振幅和相位与革兰氏矩阵和样式转移的内容重建损失联系起来。因此,基于这种等效性和连接,我们可以解释具有傅立叶相的算法之间的不同结构保存行为​​。鉴于我们的解释,我们在实践中提出了两项​​操纵,以保存结构和所需的风格。定性和定量实验都证明了我们方法对最新方法的竞争性能。我们还进行实验以证明(1)上述等效性,(2)基于傅立叶幅度和相位的可解释性以及(3)与频率分量相关的可控性。
translated by 谷歌翻译